-
Notifications
You must be signed in to change notification settings - Fork 2
Add LLM Assistant Functionality for Scoring #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
llmModel is now used for more granular control over which model to use.
src/main/kotlin/ch/uzh/ifi/access/service/CourseConfigImporter.kt
Outdated
Show resolved
Hide resolved
|
|
||
| // Polling loop | ||
| var attempts = 0 | ||
| val maxAttempts = 20 // Adjust as needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make some recommendation for a reasonable range of values from your experience?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A call to the service can take up to 2-3 seconds for longer messages. I think 20 should be enough for most cases, but I can imagine in the case of an exam with more student a higher number would be better (maybe 30 or 40). Then it polls for around 1 minute until it gives up, which should only be an issue if more than 30 students submit at the exact same time and the model has high load so it's slower. From my experience GPT-4o is a bit slower than smaller models like 4o-mini.
In the end this is just an upper limit, it does not make a difference if we set this to 100, the service will just try longer to check the status of the evaluation task.
| val assistantResponse = evaluateSubmissionWithAssistant( | ||
| AssistantDTO( | ||
| question = task.llmPrompt ?: task.instructions ?: "No instructions provided", | ||
| answer = studentCode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I don't understand. It appears that the LLM service is provided with submission.files, which will include both code and the text answer file? I thought it should contain only the text answer (since this is what the rubrics etc. are targetting). Or is it better to include the student's code? Wouldn't it confuse the model if the code is wrong? Also, I think the ACCESS frontend will need to send the text answer separately (i.e. not as part of submission.files, but for example a newly added submission.textAnswer, which we need to add to the Submission model, DTO and frontend? As in, the frontend needs to:
- check if the task involves a text answer
- remove the text answer from the files submitted as "regular submission files"
- add the text answer as an extra llm text answer field to the submission
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a mistake. I now changed it to only take the file that is specified by file path in the config. I think this way we can always take the relevant file and send only that.
| modelMapper.typeMap(TaskDTO.class, Task.class) | ||
| .addMappings(mapping -> mapping.skip(TaskDTO::getFiles, Task::setFiles)); | ||
| .addMappings(mapper -> { | ||
| mapper.skip(Task::setLlmSubmission); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was it necessary to skip all these? Normally, CourseConfigImporter can rely on the ModelMapper to take care of most fields, but you manually implemented the mapping in CourseConfigImporter instead. Is that necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because the Task.class contains each field separately while the TaskDTO contains the llm config in one single DTO object.
This PR is part of the Graded-By-AI Master thesis and allows for scoring through LLMs.
The relevant changes are: